Encoding standards for large text resources: The Text Encoding Initiative
نویسنده
چکیده
The Text Encoding Initiative (TEl) is an international project established in 1988 to develop guidelines for the preparation and interchange of electronic texts for research, and to satisfy a broad range of uses by the language industries more generally. The need for standardized encoding practices has become inxreasingly critical as the need to use and, most importantly, reuse vast amounts of electronic text has dramatically increased for both research and industry, in particular for natural language processing. In January 1994, the TEl isstled its Guidelines for the Fmcoding and hiterehange of Machine-Readable Texts, which provide standardized encoding conventions for a large range of text types and features relevant for a broad range of applications.
منابع مشابه
Representation schemes for language data: the Text Encoding Initiative and its potential impact for encoding African languages
The Text Encoding Initiative (TEI)Guidelines for the Encoding and Interchange of Machine-Readable Texts provide standardized encoding conventions for a large range of text types and features relevant for a broad range of applications. Given the potential challenges of encoding texts in the African languages, it will be important to establish collaboration between the TEI and projects encoding l...
متن کاملA Corpus of Textual Revisions in Second Language Writing
This paper describes the creation of the first large-scale corpus containing drafts and final versions of essays written by non-native speakers, with the sentences aligned across different versions. Furthermore, the sentences in the drafts are annotated with comments from teachers. The corpus is intended to support research on textual revision by language learners, and how it is influenced by f...
متن کاملTreating metadata as annotations: separating the content markup from the content
The use of digital learning resources creates an increasing need for semantic metadata, describing the whole resource, as well as parts of resources. Traditionally, schemas such as Text Encoding Initiative (TEI) have been used to add semantic markup for parts of resources. This is not sufficient for use in a ”metadata ecology”, where metadata is distributed, coherent to different Application Pr...
متن کاملTEI P5 as an XML Standard for Treebank Encoding∗
The aim of the paper is to show that a subset of Text Encoding Initiative Guidelines is a reasonable choice as a standard for stand-off XML encoding of syntactically annotated corpora. The proposed TEI schema — actually employed in the National Corpus of Polish — is compared to other such candidate standards, including TIGER-XML, SynAF and PAULA.
متن کاملLessons learned from using SGML in the Text Encoding Initiative
In April of 1994 the ACH-ALLC-ACL Text Encoding Initiative published Guidelines for Electronic Text Encoding and Interchange (Document TEI P3). SGML was used as the basis for the encoding scheme that was developed. Several innovative approaches to the use of SGML were devised during the course of the project. Three aspects of this innovation are documented in the paper. First, all of the tags a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994